Abstract: In a research area, plagiarism detection is more important to identifying duplicate documents in MEDLINE. In this paper ,we find the sentence based meaning. i.e., the given documents to match with the several documents for that if any sentences meanings are similar, we can find out easily. Each word has multiple meaning and multiple concepts (CUI) and also several alternative words to deal with the given documents. Information Retrieval based MEDLINE plagiarism detection has two approaches such as the candidate document selection and detailed analysis. The first attempt candidate document selection, identifying a set of candidate source from a document collection. In the second stage of detailed analysis, which make an complete comparison of the suspicious document with all candidates to identify similar sections. The Selected suspicious documented can also be check with the vocabulary mismatch by using Query Expansion. It’s based on the UMLS Metathesarus and Word Sense Disambiguation. To identify the candidate document selection method by using Kullback-Leibler Distance.
Keywords: Information Retrieval, Kullback-Leibler Distance, MEDLINE, Plagiarism Detection, UMLS Metathesarus, Word Sense Disambiguation.